MAS962 course project proposal

Greg Detre

Wednesday, October 16, 2002

Sorry to have been dragging my feet over the project proposal. Here�s a preliminary outline of what I hope might be interesting and feasible to try. I would welcome any criticisms or guidance.

Introduction

I started by reading some of Luc Steels' papers, but they've been a bit disappointing. I like the questions he's addressing, and his methodological framework (Steels, 1997, �Synthesising the origins of language and meaning using co-evolution, self-organisation and level formation�, http://citeseer.nj.nec.com/steels97synthesising.html), but I suppose I'm disappointed by how much he has to build in to his models in order for the self-organisation to work. For instance, there's a paper in which he tries to self-organise a spatial vocabulary within a language community (Steels 1995, �A self-organising spatial vocabulary�, http://arti.vub.ac.be/steels/space.ps), which works, but only within the crippling constraint of pre-specified meaning sets that the vocabulary maps onto. The only self-organisation that is happening is the mapping of the arbitrary vocabulary symbols onto the pre-specified meaning set (of extremely basic spatial terms like �front�, �left� etc.), simply by correlating referential 'gestures' with language-symbol 'utterances'.

I�m still very keen to investigate the space-time representation crossover that we discussed with regard to Boroditsky�s and Regier�s work, but that has had to take a back-burner while I�ve been caught up with other things. I realise that you�d like to keep that project and the MAS962 project fairly separate, but since I have been thinking about space/time representations a lot, I was hoping you wouldn�t mind if I considered that as the domain for the MAS962 project.

Sketch of the model

Although the Regier & Carlson �above� model is elegant and rigorous, I feel that it might actually be more complicated than necessary for the sort of hypotheses I�d want to consider first. I thought that it might actually be possible to make progress using a very simple model, similar to the one described in Steels (1995) that I mentioned above. The world would consist of a 1D or 2D spatial grid, discrete timesteps, two agents and a series of objects whose changing location they have to describe to each other.

Steels uses the prepositions �front�, �left�, �right� and �back�, but since I�m hoping to shift into the temporal domain eventually, I considered that �near� might be appropriate, simple and fundamental. I�m imagining �nearness� being defined as just the Euclidean distance from an agent to the object. There would be some threshold radius, within which an object would be considered �near�. It may also be necessary to define �far� as the converse of �near�.

I haven�t read that paper on the semantics of �in� yet, but I mean to. I�m hoping that �near� will prove a less problematic notion than you said �in� is.

Possible directions

It might make sense to start with a one-dimensional world � visualise a bridge � with the objects moving towards the agents from one side. They take it in turns to be able to see the objects coming towards them, and to shout a �near� warning to the other agent (one agent holds their shared eyeball, while the other operates their shared pogo stick). When an object becomes �near�, they have to �jump� for the interaction to be a success. Having learned �near�, we might eventually have the objects come from both sides, and use �front� and �back�.

Things get interesting when the speed that the objects approach at can vary. At this point, the time-to-impact is no longer simply a function of distance, but also of speed. This would allow us to either redefine �near� to be �soon� (i.e. a function that combines either far-and-fast, or close-and-slow, or close-and-fast), or to split it into two notions of (physically) �close� and �fast�.

I�ve described things in this careful, limited-seeming way because this seems like the most intuitive way to build a spatial representation that could be co-opted into the temporal domain. I�m sure you have a clearer idea than I do of how the Regier model might be co-opted into the temporal domain, but I had a really hard time conceiving how it might work at all, at least not without adding enormous extra complication. As it is, I think it may be necessary to move to a 2D spatial world, but I thought it might make things simpler initially if the spatial and temporal representations were of the same dimensionality. I�m also initially only thinking in terms of the agents and objects as points.

I also think it may prove necessary to allow the agents freedom of movement in at least one dimension in order to see anything interesting in the temporal domain, especially with regard to perhaps teasing apart ego- vs time-moving representations. At the moment, this is little more than a vague hunch, but it seems difficult to imagine how an ego-moving representation of time would ever emerge without some notion of the ego being able to move. This may have something to do with the fact that a distinction between egocentric and allocentric coordinates only exists when one can move within the world. I presume it will then also be necessary for the agents to be able to move independently of each other, and to represent each other�s position relative to the object (presumably in allocentric) coordinates. I suspect that would be complicated, which is why I don�t want to introduce it into the model until later.

Connectionist representations

I�m pretty keen for the representations to be connectionist, at least for the most part. This is partly because I worry that any predominantly symbolic representation ends up being spoon-fed its structure by the designer, though I�m aware that much the same could be argued of connectionist representations too. My preference for connectionism is also motivated by an attempt to get away from Steels� use of pre-specified meaning sets. His emphasis is on attaching the vocabulary to these meaning sets by means of a self-organising population-wide convergence. I see this as analogous to the way that bees� decision to move hive results in a population-wide convergence very suddenly � this also seems to be the same sort of basic self-organisation that Nowak et al. have in mind in their model of evolutionary population dynamics (2000, http://www.ptb.ias.edu/plotkin/EvolutionOfSyntax.pdf). One means of achieving this self-organising of the meaning sets might be to use a reinforcement learning algorithm, rewarding the agents when they successfully communicate and avoid a near object, as a means of conveying the meaning of �near�. The opposite approach would be to use a backpropagation net (say) with an explicit teacher/target stimulus to convey the sense of each proposition. I can�t see any obvious advantage to this over a symbolic approach, but it may prove the most tractable way to start.

Approaches, assumptions and contentions

Just to defend some of the choices I�ve made philosophically, the assumptions and contentions that I�ve tried to set out for myself are as follows. I�m afraid they�re pretty vague at the moment.

Approach

I�m interested in how the adaptive value of language shapes its structure. The notion of �adaptiveness� implies an evolutionary framework, so maybe it�s a misleading term to use. By it, I�m thinking of any approach where the learning algorithm is in some way rewarded/reinforced, and that this derived purpose anchors meaning. Eventually, I�m keen to explore the idea of language games, as a means of exploring what sort of selective pressures (i.e. different tasks) might have been operating to produce human language.

Assumptions

(Much of this is drawn from Steels, amongst others).

Language is emergent/distributed

There�s no centralised controller or single language community leader, and you can�t have a single speaker of a language. When we talk about �Language�, we�re referring to the past and possible future interactions between its speakers. I�m hoping that two speakers will be enough for the dynamics to work.

It �spontaneously forms itself once the appropriate physiological, psychological and social conditions are satisfied� (see Steels, 1997).

Language is too complex to arise out of a generic learning algorithm or cognitive mechanism. There must be some degree of specialisation or hard-wired bias to narrow the search space to make learning the language tractable for new speakers. In this model, most of the �hard-wired bias� will probably be in the form of my tweaking of the agents� cognitive architecture.

Language is the tip of the cognitive iceberg

You can�t speak a language without a rich set of separable but densely inter-related concepts, that you share to a greater or lesser degree with fellow speakers, and are grounded in a shared environment and perceptuomotor mechanisms. Language and conceptual structures are interwoven � they influence each other, but you can�t talk in terms of one (wholly) determining or preceding the other.

A language commmunity is shaped by the experiences, limitations and the nature of embodiment of its individual constituents. It follows then that you can�t have even a rudimentary language without also having a functional, if basic, set of concepts and behaviours.

Language is combinatorial

We can, of course, imagine a system of communication which does not involve combining units. For instance, we can imagine some lookup table of arbitrary symbols, each of which stood for some important �sentence�, and it might be pretty adaptive for its speakers, especially if they had large memories and a basic or stable environment.

However, our world is not stable, and even if it was, it�s too complex for a lookup table to express usefully without being enormous. As a result, we employ syntax as a means of describing a potentially infinite number of possible combinations between concepts.

So although there are certainly non-combinatorial proto-languages, I can�t see how there could be an interesting language that didn�t involve combination, or some analagous form of syntax. Put in other terms, without generative rules that can be applied an arbitrary number of times, the representative/expressive capacity of a language will always be finite and impoverished.

At this stage, the questions being investigated in this model are preliminary to the question of syntax, but it may be possible to build up more complex messages such as �close fast front�.

Contentions/questions unanswered

Whether communication is the main selective pressure behind language, or whether there is some alternative internal benefit to being a language-user.

In this model, communication is the major �selective pressure� for these agents. At this stage, I don�t see how it would be possible to look at the extent to which the proto-language has shaped the concepts it expresses.

I�d really like to consider how questions, statements and imperatives arise and combine to form discourse, but I don�t think it will really be an issue I�ll be able to address here.

What I hope to show

At the very least, it would be enormously satisfying and instructive to build a basic one-dimensional world model, even with a language community of two agents, who can self-organise the prepositions �near�, (maybe �far�), �front� and �back�. This wouldn�t be breaking any particularly new ground, although I haven�t yet come across a paper that looks at the preposition �near�.

I feel that the project could take two directions at this point.

I could either focus on simple spatial notions, expand the world to two dimensions, give the agents freedom of movement in one dimension, and investigate whether they can form representations in allocentric coordinates. I know that the problem of converting to/from allocentric to egocentric coordinates is a big one, but there does seem to be some promising work (e.g. Andersen et al. (1997), �Multimodal representation of space in the posterior parietal cortex and its use in planning movements�, http://eye-hand.wustl.edu/lab/publications/pdf/ARN1997-Andersen.pdf). I would hope I could build upon this, and consider the problem of how the agents could then communicate instructions to each other, which would presumably require them to compute in coordinates relative to the other agent (e.g. �you�re to my right, and the object is near to you, so you should move to your left�).
I could focus on the cross-over from representing space to representing time. The best way that I�ve come up with of doing this would be to have the objects approaching at different speeds. I think this would still be possible within a one-dimensional world, which is why I chose near/soon as my central proposition, with front/coming as potentially a secondary proposition, to investigate.

I think that investigating the temporal effects would be more likely to work, and might be potentially more interesting. However, I suspect that the most interesting effects, like the ego- vs time-moving representations, would require a pretty advanced cognitive architecture to be seen. Indeed, they must be contingent to some degree on the particular way in which the human brain represents space, time and movement, and I can�t yet see how to strip away the extraneous factors to model them. If we were investigating time in a two-dimensional world, then it might eventually also be possible to consider horizontal vs vertical linguistic conceptions of time as well.